EN FR
EN FR


Section: New Results

Deep Learning and Information Theory

  • Neural networks for computer vision We continued working on the topic of large-scale image segmentation with multiple object detection. The application target is the analysis of high-resolution multispectral satellite images covering the Earth. Challenges are numerous: finding good features to distinguish objects, obtaining fine-resolution segmentations, while dealing with badly-registered groundtruth, keeping a scalable complexity, while avoiding boundary effects when tiling a big image into small ones, which are processed independently and merged back together. We propose to move to fully convolutional neural networks [45] to avoid artifacts from patch-based approaches. We show the benefits of training first on imprecise groundtruth, which is available in large amounts, and then refining on precise but scarce groundtruth [13]. To further refine the segmentation, as convolutional networks tend to produce blurry outputs, we use recurrent neural networks to learn the partial differential equation (PDE) which would sharpen the segmentations, i.e. an iterative process taking into account the edges in the original image to locate precisely their boundaries and to sharpen them [67]. Finally, to benefit simultaneously from information at various resolutions, we design a new, more suitable architecture [68].

    We also started to work on medical image classification, in the long-term goal of automatic diagnosis, in collaboration with the Necker Hospital and the Inria start-up Therapixel, and on image labelling and representation, with the database editor company Armadillo, through the Adamme project (cf Section 9.2.1).

    In collaboration with the University of Barcelona, we organize a series of challenges in video analysis of human behavior (ChaLearn Looking at People series). Looking at People (LAP) is an area of research that deals with the problem of automatically recognizing people in images, detecting and describing body parts, inferring their spatial configuration, performing action/gesture recognition from still images or image sequences, often including multi-modal data. Any scenario where the visual or multi-modal analysis of people takes a key role is of interest to us within the field of Looking at People. We have been leaders in organizing challenges in this area since 2013 [10], [12], [36], organizing events sponsored by DARPA, NSF, Microsoft, Google, Facebook, NVIDIA, and others. In 2016 we organized follow up competitions on gesture recognition [52] and face aging [37] to advance the state-of-the-art in areas we had previously explored. We also organized two rounds of a completely new recognition on personality trait evaluation from short video clips [47], [34]. The purpose of this study is to evaluate whether human first impression judgements are consistent and reproducible. Such research could lead to device coaching curricula to help job applicants present themselved better and hiring managers to overcome unsubstantiated negative biases. The winners of the challenge used Deep Learning methods. The third place winners teamed up with the organizers to put together a demonstration system, which was shown at the NIPS conference(https://nips.cc/Conferences/2016/Schedule?showEvent=6314). Work performed in collaboration with UC Berkeley on fingerprint verification using Deep Learning was also presented in this demonstration.

  • Natural Gradients for Deep Learning Deep learning is now established as a state-of-the-art technology for performing different tasks such as image or sequence processing. Nevertheless, much of the computational burden is spent on tuning the hyper-parameters. On-going work, started during the TIMCO project, is proposing, in the framework of Riemannian gradient descents, invariant algorithms for training neural networks that effectively reduce the number of arbitrary choices, e.g., affine transformations of the activation functions or shuffling of the inputs. Moreover, the Riemannian gradient descent algorithms perform as well as the state-of-the-art optimizers for neural networks, and are even faster for training complex models. The proposed approach is based on Amari's theory of information geometry and consists in practical and well-grounded approximations for computing the Fisher metric. The scope of this framework, going beyond Deep Learning, encompasses any class of statistical models. This year's contribution is a new, simple framework (both theoretical and practical) that allowed us to release a simpler implementation of these techniques in Torch (one of the main deep learning libraries in use) and demonstrate good performance on real data. We have also started to explore criteria from information geometry criteria for automating the construction and selection of network architectures themselves, a major problem given the current trend towards highly complex, hand-built model architectures (P. Wolinski's PhD).

  • Training dynamical systems online without backtracking with application to recurrent neural networks. The standard way to train recurrent neural networks and other systems that exhibit a temporal dynamical behavior involves “backpropagation through time”, which as the name indicates goes backward in time and is unrealistic. Last year we proposed an algorithm to learn the parameters of a dynamical system in an online, memoryless setting, thus scalable and requiring no backpropagation through time, in a way guaranteed to be unbiased. This year we started to provide full convergence proofs for this algorithm (the first of their kind). Moreover Corentin Tallec (PhD) proposed a considerably simpler version of the algorithm keeping the same key mathematical properties, which now allows for a simple “black-box” implementation on top of any existing recurrent network model.